Includes material from Ian Dworkin and Jonathan Dushoff, but they bear no responsibility for the contents.
From ?lme4::sleepstudy, “The average reaction time per day (in milliseconds) for subjects in a sleep deprivation study.” A standard example for mixed models.
library(lme4)
library(brms)
options(brms.backend = "cmdstanr")
library(broom.mixed)
library(purrr)
library(dplyr)
library(tidybayes)
library(bayesplot)
library(bayestestR)
library(ggplot2); theme_set(theme_bw())
library(see)
options(ggplot2.discrete.colour= scale_color_okabeito)
library(ggrastr)
library(cowplot)
We’ll fit the model Reaction ~ Days + (Days|Subject) (a linear regression + random-slopes model)
\[ \newcommand{\XX}{\mathbf X} \newcommand{\ZZ}{\mathbf Z} \newcommand{\bbeta}{\boldsymbol \beta} \newcommand{\bb}{\mathbf b} \newcommand{\zero}{\boldsymbol 0} \begin{split} \textrm{Reaction} & \sim \textrm{Normal}(\XX \bbeta + \ZZ \bb, \sigma^2) \\ \bb & \sim \textrm{MVNorm}(\zero, \Sigma) \\ \Sigma & = \textrm{block-diag}, \left( \begin{array}{cc} \sigma^2_i & \sigma_{is} \\ \sigma_{is} & \sigma^2_{s} \end{array} \right) \end{split} \]
form1 <- Reaction ~ Days + (Days|Subject)
brms has a convenience function get_prior() that lays out the parameters/priors that need to be set, and their default values …
get_prior(form1, sleepstudy)
## prior class coef group resp dpar nlpar lb ub source
## (flat) b default
## (flat) b Days (vectorized)
## lkj(1) cor default
## lkj(1) cor Subject (vectorized)
## student_t(3, 288.7, 59.3) Intercept default
## student_t(3, 0, 59.3) sd 0 default
## student_t(3, 0, 59.3) sd Subject 0 (vectorized)
## student_t(3, 0, 59.3) sd Days Subject 0 (vectorized)
## student_t(3, 0, 59.3) sd Intercept Subject 0 (vectorized)
## student_t(3, 0, 59.3) sigma 0 default
Info on brms default priors: Intercept is Student \(t\) with 3 df, mean equal to observed mean (median??), SD equal to (rescaled) MAD. RE SDs are the same but half-\(t\) (see lb = 0 column), mode at 0.
Constraining intercept and slope to ‘reasonable’ values:
b_prior <- c(set_prior("normal(200, 50)", "Intercept"),
set_prior("normal(0, 10)", "b")
)
Helper function for prior predictive simulations: run a short MCMC chain, plot results. (We’re going to ignore/suppress warnings for this stage …)
test_prior <- function(p) {
## https://discourse.mc-stan.org/t/suppress-all-output-from-brms-in-markdown-files/30117/3
capture.output(
b <- brm(form1, sleepstudy, prior = p,
seed = 101, ## reproducibility
sample_prior = 'only', ## for prior predictive sim
chains = 1, iter = 500, ## very short sample for convenience
silent = 2, refresh = 0 ## be vewy vewy quiet ...
)
)
p_df <- sleepstudy |> add_predicted_draws(b)
## 'spaghetti plot' of prior preds
gg0 <- ggplot(p_df,aes(Days, .prediction, group=interaction(Subject,.draw))) +
geom_line(alpha = 0.1)
print(gg0)
invisible(b) ## return without printing
}
test_prior(b_prior)
Decrease random-effects SDs:
b_prior2 <- c(set_prior("normal(200, 10)", "Intercept"),
set_prior("normal(0, 5)", "b"),
set_prior("student_t(3, 0, 0.1)", "sd")
)
test_prior(b_prior2)
Make all scales even smaller?
b_prior3 <- c(set_prior("normal(200, 5)", "Intercept"),
set_prior("normal(0, 2)", "b"),
set_prior("student_t(3, 0, 0.01)", "sd")
)
test_prior(b_prior3)
We forgot to constrain the prior for the residual standard deviation!
b_prior4 <- c(set_prior("normal(200, 5)", "Intercept"),
set_prior("normal(0, 2)", "b"),
set_prior("normal(0, 1)", "sd"),
set_prior("normal(0, 1)", "sigma")
)
test_prior(b_prior4)
Now relax a bit …
b_prior5 <- c(set_prior("normal(200, 10)", "Intercept"),
set_prior("normal(0, 8)", "b"),
set_prior("student_t(10, 0, 3)", "sd"),
set_prior("student_t(10, 0, 3)", "sigma")
)
test_prior(b_prior5)
In hindsight, should relax more. Set intercept back to default value, widen fixed-effect (slope) prior
b_prior6 <- c(set_prior("normal(0, 20)", "b"),
set_prior("student_t(10, 0, 3)", "sd"),
set_prior("student_t(10, 0, 3)", "sigma")
)
test_prior(b_prior6)
There ae still a few negative reaction times left, which is obviously unrealistic, but at least they’re rare …
adapt_delta below is one knob to turn to make the MCMC ‘work harder’ to avoid geometry problemsbrms does a lot of fancy stuff under the hood to try to make things work wellm_lmer <- lmer(form1, sleepstudy)
b_reg <- brm(form1, sleepstudy, prior = b_prior5,
seed = 101,
control = list(adapt_delta = 0.95)
)
b_reg2 <- brm(form1, sleepstudy, prior = b_prior6,
seed = 101,
control = list(adapt_delta = 0.95)
)
## also try with default settings
b_default <- brm(form1, sleepstudy,
seed = 101
)
(These are actually run as a batch from run_examples.R and loaded here …)
load("examples1.rda")
Should really diagnose before looking at the parameter values! MCSE is the Monte Carlo Standard Error, equal to (std err)/sqrt(ESS).
print(bayestestR::diagnostic_posterior(b_reg),
digits = 4)
## Parameter Rhat ESS MCSE
## 1 b_Days 1.001 1185 0.08135
## 2 b_Intercept 1.002 1152 0.28076
\(\hat R\) (Rhat), ESS look OK. Vehtari et al. (2021) recommend an \(\hat R\) threshold of 1.01, ESS > 400, MCSE (Monte Carlo standard error) ‘small enough’ for scientific purposes (“figure out what is the needed accuracy for the quantity of interest (for reporting usually 2 significant digits is enough”)
regex_pars specifies a regular expression for deciding which parameters to show
mcmc_trace(b_reg, regex_pars= "b_|sd_")
Vehtari et al. (2021) recommend rank-histograms instead: chains should have similar rank distributions
mcmc_rank_overlay(b_reg, regex_pars= "b_|sd_")
summary(b_reg)
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: Reaction ~ Days + (Days | Subject)
## Data: sleepstudy (Number of observations: 180)
## Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup draws = 4000
##
## Group-Level Effects:
## ~Subject (Number of levels: 18)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 20.69 7.76 6.17 37.31 1.00 865 577
## sd(Days) 10.89 2.68 6.51 16.76 1.00 1212 2037
## cor(Intercept,Days) 0.50 0.25 -0.04 0.91 1.00 703 786
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 233.85 9.53 213.16 250.81 1.00 1064 934
## Days -0.42 2.80 -5.97 4.76 1.00 1224 1832
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 25.69 1.63 22.87 29.21 1.00 2231 2618
##
## Draws were sampled using sample(hmc). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
Plot results. broom.mixed::tidy() will get what you need - complicated code below is to get everything lined up nicely.
brms_modlist <- list(brms_default = b_default, brms_reg = b_reg, brms_reg2 = b_reg2)
res_bayes <- (brms_modlist
|> purrr::map_dfr(~ tidy(., conf.int = TRUE), .id = "model")
)
## need to do separately - different conf.method choices
res_lmer <- suppressMessages(m_lmer
|> tidy(conf.int = TRUE, conf.method = "profile")
|> mutate(model = "lmer", .before = 1)
)
res <- (bind_rows(res_lmer, res_bayes)
|> select(-c(std.error, statistic, component, group))
|> filter(term != "(Intercept)")
|> mutate(facet = ifelse(grepl("^cor", term), "cor",
ifelse(grepl("Days", term), "Days",
"int")))
|> mutate(across(model, ~ factor(., levels = c("lmer", names(brms_modlist)))))
)
ggplot(res, aes(estimate, term, colour = model, shape = model)) +
geom_pointrange(aes(xmin = conf.low, xmax = conf.high),
position = position_dodge(width = 0.5)) +
facet_wrap(~ facet, scale = "free",ncol = 1) +
guides(colour = guide_legend(reverse=TRUE),
shape = guide_legend(reverse=TRUE))
post_df1 <- sleepstudy |> add_predicted_draws(b_reg)
gg1 <- ggplot(post_df1,
aes(Days, .prediction, group=interaction(Subject,.draw))) +
rasterise(geom_line(alpha = 0.1)) +
geom_point(aes(y=Reaction), col = "red") +
labs(y = "Reaction time")
print(gg1 + labs(title = "tighter priors (weird)"))
post_df2 <- sleepstudy |> add_predicted_draws(b_reg2)
print(gg1 %+% post_df2 + labs(title = "looser priors"))
Compare just Subject 1 results:
post_df1B <- filter(post_df1, Subject == levels(Subject)[1])
post_df2B <- filter(post_df2, Subject == levels(Subject)[1])
plot_grid(gg1 %+% post_df1B + labs(title = "tighter"),
gg1 %+% post_df2B + labs(title = "looser"))
Bottom line: b_prior5/b_reg give “wrong” answers (priors are too tight), b_prior6/b_reg2 are OK, but the predictions are nearly identical. This turns out to be because the random effects are taking up the slack in the
Plot random effect draws, from here:
Random effects (deviations from population-level/fixed-effect predictions):
brms_modlist <- list(brms_default = b_default, brms_reg = b_reg, brms_reg2 = b_reg2)
ranefs <- (brms_modlist
|> purrr::map_dfr(~ tidy(., effects = "ran_vals", conf.int = TRUE), .id = "model")
)
gg_r <- ggplot(ranefs, aes(estimate, level, colour = model)) +
geom_pointrange(aes(xmin = conf.low, xmax = conf.high), position = position_dodge(width = 0.5)) +
facet_wrap(~term, scale = "free", nrow = 1)
print(gg_r + geom_vline(lty = 2, xintercept = 0))
Group-level coefficients (predicted intercept/slope for each subject):
## this is way too ugly - needs to be incorporated into broom.mixed as an option
my_coefs <- function(x) {
meltfun <- function(a) {
dd <- as.data.frame(ftable(a)) |>
setNames(c("level", "var", "term", "value")) |>
tidyr::pivot_wider(names_from = var, values_from = value) |>
rename(estimate = "Estimate",
std.error = "Est.Error",
## FIXME: not robust to changing levels
conf.low = "Q2.5",
conf.high = "Q97.5")
}
purrr:::map_dfr(coef(x), meltfun, .id = "group")
}
coefs <- (brms_modlist
|> purrr::map_dfr(my_coefs, .id = "model")
)
print(gg_r %+% coefs)
Compare means of random effects (should be zero!)
ranefs |>
group_by(model, term) |>
summarise(mean = mean(estimate),
se = sd(estimate)/n(),
.groups = "drop")
## # A tibble: 6 × 4
## model term mean se
## <chr> <chr> <dbl> <dbl>
## 1 brms_default (Intercept) 0.106 1.18
## 2 brms_default Days 0.0887 0.302
## 3 brms_reg (Intercept) 16.4 0.868
## 4 brms_reg Days 10.9 0.328
## 5 brms_reg2 (Intercept) -0.0339 0.731
## 6 brms_reg2 Days 0.0489 0.309
## to plot:
## |>
## ggplot(aes(mean, term, colour = model)) +
## geom_pointrange(aes(xmin=mean-se, xmax = mean + se), position = position_dodge(width = 0.5))
check_prior(b_reg)
try(check_prior(b_reg2)) ## ugh
debug(bayestestR:::.check_prior)
try(check_prior(b_reg2, method = "lakeland")) ## ugh
Bolker, Benjamin M., Beth Gardner, Mark Maunder, Casper W. Berg, Mollie Brooks, Liza Comita, Elizabeth Crone, et al. 2013. “Strategies for Fitting Nonlinear Ecological Models in R, AD Model Builder, and BUGS.” Edited by Satu Ramula. Methods in Ecology and Evolution 4 (6): 501–12. https://doi.org/10.1111/2041-210X.12044.
Vehtari, Aki, Andrew Gelman, Daniel Simpson, Bob Carpenter, and Paul-Christian Bürkner. 2021. “Rank-Normalization, Folding, and Localization: An Improved R-hat for Assessing Convergence of MCMC (with Discussion).” Bayesian Analysis 16 (2): 667–718. https://doi.org/10.1214/20-BA1221.
Last updated: 01 June 2023 20:33